10. Dummy Variables

If you are interested in adding a categorical variable to a regression model that tells us the colors of a street light: red, yellow, and green, how many dummy variable columns will be added to your linear regression model?

SOLUTION: 2

If I have a categorical variable with two levels yes or no, how many dummy variables would needed to be added to a linear model to use this variable?

SOLUTION: 1

Imagine you own a restaurant, and you have a ratings scale of: 'great', 'good', 'okay', 'poor', or 'awful'. You would like to understand the tip given based on this rating, so you build a linear model, using dummy variables to represent the ratings. How many total coefficients are in your model?

SOLUTION: 5

Which of the below are true regarding the dummy variables we add to our multiple linear regression models? Let X be the X matrix as defined in the previous Screencast. Mark all that are true.

SOLUTION:
  • There should always be as many dummy variables added to your X matrix as the number of levels of each categorical variable minus 1.
  • The reason for dropping a dummy variable is to assure that all of our columns are linearly independent.
  • The reason for dropping a dummy variable is to assure that the dot product of X'X is invertible.
  • The reason for dropping a dummy variable is to assure that your X matrix is full rank.